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Abstract 

Wc propose a new method to test conditional independence of two real random 
variables Y and Z conditionally on an arbitrary third random variable X. The par- 
tial copula is introduced, defined as the joint distribution of C/ = Fy\x{Y\X) and 
V = Fz\x{Z\X). We call this transformation of {Y, Z) into (C/, V) the partial copula 
transform. It is easy to show that if Y and Z are continuous for any given value 
of X, then Y 1-Z\X implies U A.V . Conditional independence can then be tested by 
(i) applying the partial copula transform to the data points and (ii) applying a test 
of ordinary independence to the transformed data. In practice, Fy\x and Fz\x will 
need to be estimated, which can be done by, e.g., standard kernel methods. Wc show 
that under easily satisfied conditions, and for a very large class of test statistics for 
independence which includes the covariance, Kendall's tan, and Hoeffding's test statis- 
tic, the effect of this estimation vanishes asymptotically. Thus, for large samples, the 
estimation can be ignored and we have a simple method which can be used to apply 
a wide range of tests of independence, including ones with consistency for arbitrary 
alternatives, to test for conditional independence. A simulation study indicates good 
small sample performance. Advantages of the partial copula approach compared to 
competitors seem to be simplicity and generality. 

Keywords: partial copula, partial correlation 

1 Introduction 

Random variables Y and Z are conditionally independent given X if, knowing the value of 
X, information about Y does not provide information about Z. Following Dawid (1979), 
this hypothesis is denoted as Y }LZ\X. Conditional independence models are the building 
blocks of graphical models, which can be used for causal modeling (Wermuth &: Cox, 1996; 
Lauritzen, 1996; Pearl, 2000). The literature on the topic is usually restricted to the normal 
and categorical cases, but recently nonparametric testing has also received a fair amount 
of attention (Su & White, 2007, 2008; Song, 2009; Huang, 2010; Bouezmarni, Rombouts, 
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&; Taamouti, 2010). These papers also contain useful further references and applications. 
At the end of this section a brief literature review is presented as well. This paper presents 
the partial copula approach for the nonparametric testing of the conditional independence 
hypothesis for real random variables Y and Z controlling for an arbitrary random variable 
X based on n independent and identically distributed (iid) replications of {X,Y,Z). 

In general, compared to the testing of ordinary independence, the present problem 
seems a difficult one (Bergsma, 2004). In particular, if, without further assumptions, for a 
given value of X only one {Y, Z) observation is available, this does not tell us anything at 
all about the conditional association for that particular value of X. Alternatively, it can 
be said that the difficulty lies in the possible presence of marginal dependencies of Y on 
X and of Z on X: if both YALX and ZALX, then YALZ\X imphes YALZ. This is easily 
shown as follows: assuming the conditions hold, 

P{X (£A,Y(£B) = P{X (£A,Y€ B\X = x) 

= P{X € A\X = x)P{Y G B\X = x) = P{X G A)P{Y G B) (1) 

for measurable subsets A and B of the sample spaces of Y and Z, an x in the sample 
space of X . Thus, in the absence of the marginal associations, conditional independence 
for (X, Y, Z) can be tested simply by performing a test of ordinary independence for (Y, Z). 
In practice, of course, marginal dependencies are typically present, and conditional inde- 
pendence might be tested by removing (some of) the marginal dependency. Below, we 
describe two methods which use this idea. 

Firstly, the idea of removing some of the marginal association is used to construct the 
well-known partial correlation coefficient. The assumption is made that Y = g{X) + U' 
and Z = h{X) + V' where the error terms U' and V have zero expectation conditionally 
on X. Now clearly the correlations between X and U' and between X and V' are zero, 
i.e., replacing Y by U' and Z by V removes some of the marginal association. The partial 
correlation coefficient is the correlation between U' and V , which is zero under conditional 
independence. Thus, the partial correlation coefficient can be used for a nonparametric 
test of conditional independence. 

Secondly, we present the approach of this paper, which has two key distinguishing 
features compared to the partial correlation approach: (i) the use of conditional ranks and 
(ii) complete removal of the marginal dependence. It is based on the new concept of the 
partial copula, which, like the partial correlation, is easily defined. Let X be a random 
variable on an arbitrary measurable space and let Y and Z be real-valued random variables 
with conditional distribution functions Fy\x ^z\x- With 

U = FY\x{y\X) and V = Fz\x{Z\X) (2) 

the partial copula pertaining to Y and Z given X is defined as the joint distribution 
function of [U, V) (cf. the ordinary copula pertaining to {Y, Z) is the joint distribution 
function of (y, Z)). We call transformation (2) the partial copula transform. If both Y 
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and Z have continuous conditional distribution functions for every given value of X, then 
both \J\X = X and V\X = x are uniformly distributed for all x, hence UALX and yjLX. 
Therefore, by (1), YALZ\X implies UALV. If we now apply the partial copula transform 
to iid data {Xi,Yi, Zi), . . . , y„, Z„,), i.e., setting 

Ui = FY\xiYi\Xi) and Vi = Fzix{Zi\Xi), (3) 

any test of ordinary independence for the (Ui, Vi) is a test for conditional independence for 
the {Yi,Zi) given the Xj. 

In practice, Fy\x and Fz\x are typically unknown and need to be estimated. In Sec- 
tion 2 it is shown that for a wide variety of estimators, the asymptotic distribution of 
a broad class of test statistics in the form of a covariance between two functions on the 
marginals is unaffected by the estimation. Thus, a comprehensive range of tests of ordinary 
independence lead to an asymptotic test of conditional independence. The procedure is 
illustrated using a real data set and several tests of independence, including ones based on 
the covariance and Kendall's tau, and some tests with consistency against any alternative 
such as Hoeffding's test. Questions for further research are addressed in Section 4. 

Note that, since UALV does not imply YALZ\X, a test of the hypothesis UALV cannot 
be consistent for all alternatives to the hypothesis YALZ\X: even if UALV there may still 
be so-called three- variable interaction present. Fortunately, the testing for three- variable 
interaction seems in general a lot easier than the testing for conditional independence 
(Blum, Kiefer, &: Rosenblatt, 1961), and it is reasonable to use a separate procedure for 
that. Using the partial copula transform, we can expect most power against alternatives 
with little three-variable interaction, in particular, alternatives for which the conditional 
copula, i.e., the joint distribution of {Fy\x{Y\x), Fz\xiZ\x))i does not vary much with x. 

The following are some recent approaches with the same aim as the present paper. 
Su and White (2007, 2008) compare the conditional densities /y|xz arid /y|x using char- 
acteristic functions and Hellinger distances, while Bouezmarni et al. (2010) compare the 
conditional distribution functions Fy^xz and Fy^x- Kernel methods are used to estimate 
the densities and distribution functions. The methods of Song (2009) and Huang (2010) are 
closer to the present paper's, in that they do not require estimation of the distribution of Y 
given both X and Z, but only given X. Huang uses Renyi type maximal correlations and 
differs from our approach in that it is not based on conditional ranks. Song's also uses the 
transformation (3), which he calls the Rosenblatt transform. Song and the present author 
found this independently, the present paper having appeared in different form as a technical 
report (Bergsma, 2004). Linton and Gozalo (1997) considered conditional independence 
tests based on Cramer von Mises and Kolmogorov-Smirnov criteria. 

A brief selection of older approaches is as follows. Kendall (1942) introduced a par- 
tial version of Kendall's tau. However it is not clear how useful this coefficient can be in 
practice, as it is not necessarily zero under conditional independence unless certain restric- 
tive conditions are met (Korn, 1984). Goodman (1959) and Gripenberg (1992)) proposed 
a method based on another partial version of Kendall's tau, using the number of local 
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concordant and discordant pairs of observations. The easiest case for testing conditional 
independence is if X is categorical, with sufficiently many observations per category: for 
each category, a test of independence can be done, and these tests can be combined in 
various ways. If all three variables are categorical, log-linear techniques can be used (cf. 
Lauritzen, 1996, Agresti, 2002). 

2 A practical approach to conditional independence testing 

If Fy\x and Fz\x are known, the partial copula transform (3) can be applied and an 
arbitrary test of ordinary independence can be applied to the Vi). A permutation test 
would then be valid even for small samples. In Section 2.1 a general class of test statistics 
for independence testing is described. In practice, however, Fy\x and Fz\x are usually 
unknown and need to be estimated; in Section 2.2, we show that, under fairly unrestrictive 
conditions, this estimation does not affect the asymptotic behaviour of the test statistics 
given in Section 2.1. In Section 2.3, we illustrate the procedure on a real-data example 
using several tests of independence and provide some graphical displays. 

2.1 A class of test statistics for ordinary independence 

Many measures of association are of the form 

9{Y, Z) = Es{Yi, Yr)t{Zi, ...,Zr) (4) 

for some r > 1 and where (Yi,Zi) are independent replications of (Y,Z). Square integra- 
bility of s and t is a sufficient condition for the expectation to exist. If s and t have zero 
means, then YALZ implies 6 = 0. The sample (^/-statistic) estimator which can be used 
as a test statistic for independence is 

n n 

^iY^Z) = -rY.---T. y^MZ,, , . . . , J (5) 

«1=1 lr = l 

In the absence of distributional assumptions the best way to compute a value based on 
6 is usually the permutation test, but alternatively asymptotic theory is well-developed in 
certain cases (Hoeffding, 1948; Randies & Wolfe, 1979; Serfling, 1980). 

The ordinary covariance and Kendall's tau are well-known association measures which 
can be written in the form (4). Spearman's rho can of course be written in that form as 
well, but we need not consider it because for rank data it is proportional to the covariance. 
Some coefficients of form (4) which are nonnegative and zero iff independence holds are 
the following. The first is Hoeffding's A, which can be defined as 

A{Y,Z) = ^E(P{{Yi,Y2,Y3)cl){{Yi,Y4,Y5)ct>{{Z,,Z2,Zsm{Zi,Zi,Zr,) 
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where (pizi, Z2, Z3) = I{zi > Z2) — I{zi > 23). Bergsma (2006) introduced 
k{Y,Z) = ^Ea{YuY2,Ys,Yi)a{Zi,Z2,Z3,Zi) 

and 

T*{Y, Z) = Esign[a(yi, >3, Y^)] sign[a(Zi, Z2, Z3, Z4)] 

where 

a{zi,Z2,Z2,,Z4) = \zi - Z2I + 1^3 - Z4I - \zi - Z^\ - \z2 - Za\ 

The latter is an extension of Kendah's tau introduced by Bergsma and Dassios (2010). 
Note that A, k and r* lead to tests consistent for all alternatives to independence. 

2.2 Using estimated conditional distributions 

In practice, -Fy|x a-nd are typically unknown and need to be estimated. With esti- 

mates Fy^x Fz\x^ the partial copula transform yields 

lJ, = FY\x{y^\X^) and Vi = Fz\x{Zi\^^) (6) 

and substitution into (5) gives 

-, n n 

Q{tj, V^) = - E • • • E ^(^n , • • • , tJMK , • • • , V- J 

ll = l lr = \ 

Theorem 1 states that under the following easily satisfied conditions, the fact that we are 
using estimates (f/j, V^) rather than true values {JJi^Vi) can be ignored. 

Al: s and t are continuous almost everywhere with respect to Lebesgue measure on [0, 1]'', 

A2: ?i"[^([/, V) - 9{U, V)] = Op(l) for some positive a, 

A3: for almost all {x,y,z) and some positive /3i and /32, 

n^'[Fy\x{y\x)-FY\x{y\x)] = Op{l) 
n^'[Fz\x{z\x) - Fz\xiz\x)] = Opil) 

with uniform convergence on all compact sets. 

Theorem 1 Assume Al, A2, and A3 hold. Then 

n^'ieiU, V) - 9{U, V)] = n'^[e{U, V) - 9{U, V)] + Op (^-'^'"('^i'^^)^ 



5 



Proof: Continuity of s and t (a.e.) implies continuity s x t (a.e.). Hence, 

s{Ui„...,UMVi„...,Vir)-0{U,V) = 

[siUi,,. . . , UMy^^ ,■■■,y^r)- 0{U, V)) (l + Op (n"--(/5i./^2))) 

with probability 1. From this, the uniform convergence, and A2, 

n"[e{u,v)-e{u,v)] 

-in n 

- ^ ... 5] f7,Jt(K,, V-J - e(C/, F) 

il = l ir = l 



-1 n n 

E • • • E [<Un,- ■ ■ , UMVi,,. . . , ^v) - 0{U, V)] [l + O, 



" ■ 1 • 1 

11 = 1 ir = l 



n"[^([/, V) - e{U, V)] [l + Op (n-™'^('^i'^2)) 
n°[^([/, y) - e{U, V)] + Op (n^ ^Ml3i,l3: 



with probability 1. □ 
Note that the proof is immediately adapted to [/-statistic estimators. Assumptions Al and 
A2 are satisfied for the measures of association mentioned in Section 2.1, i.e., the covariance, 
Kendall's tau, and A, k and r*. For Euclidean X, Nadaraya- Watson estimators are simple 
estimators of the marginal distribution functions satisfying A3 (under some additional 
restrictions). Alternatively, Hall, Wolff, and Yao (1999) introduced local linear and local 
logistic estimators, which improve the Nadaraya- Watson ones in certain respects. However, 
local linear estimators may not lead to a distribution function, and Hall et al. developed 
adjusted estimators which solve this problem. 

2.3 Application with graphical illustration 

We now apply the aforementioned tests to a real data example, illustrating the procedure 
with some graphical displays. Table 1 shows data on 35 consecutive patients under treat- 
ment for heart failure with the drug digoxin. The data are from Halkin, Sheiner, Peck, and 
Melmon (1975). Of medical interest is the hypothesis that digoxin clearance is independent 
of urine flow controlling for creatinine clearance, i.e., YALZ\X. We computed (6) using 
the Nadaraya- Watson estimators 

u.-Fy,Ay\x)- ^n^^^^(|,_x,|//.,) 



and 
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Table 1: Digoxin clearance data. Clearances are given in ml/min/1.73m^, urine flow in 
ml/min. Source: Halkin et al. (1975). 

Note: X= Creatinine clearance, Y= digoxin clearance, Z= urine flow. 



where /if > is a bandwidth and Kt a kernel function (t = 1,2). We used a standard 
normal kernel and chose the bandwidth using Silverman's rule of thumb: h\ = h2 = 
1.06axn-V5 = 22.48. See (Yu & Jones, 1998; Hall et al., 1999) and Koenker (2005) for 
other methods of estimation and bandwidth selection. In Figure 1(b) scatter plots are 
given of the pairs (Xj, Ui) and {Xi, Vi). A visual inspection of both pictures confirms that 
the effect of X has been removed, that is, independence seems to hold. A scatterplot of 
the (C/j, Vi) is given in Figure 2 and some dependence is apparent, which can be tested for 
in various ways; in Table 2, p- values are given for tests based on several statistics. The 
p-values were approximated by the permutation test using 10^ resamples. The simulations 
in Section 3 indicate that the tests are likely to be slightly liberal. Even taking that into 
account, there still appears to be good evidence for lack of conditional independence. 

3 Simulation study 

We try to answer the following two questions regarding conditional independence tests 
based on the partial copula: 

1. How good are the type I error rates? 

2. How much loss of power is there compared to the corresponding unconditional test 
of independence? 
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(a) Untransformed marginal distributions with association present 
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(b) Transformed marginal distributions with no apparent association 

Figure 1: Illustration that the partial copula transformation of YJs into f/jS and ZiS into 
ViS removes marginal association. By (1), a test of independence for the {Ui,Vi)s is a test 
of conditional independence for the (1^, Zi)s given the XjS. 
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Figure 2: Partial copula transforms (t/j, Vi) for the digoxin data. Visual inspection suggests 
the presence of an association for the (C/i, Vi) which would imply a conditional assocation 
for the (Yi, Zi) given the Xi. 
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Test statistic 


value 


p{u,v) 


.018 


f{U,V) 


.022 


A{U,V) 


.107 


k{U,V) 


.041 


f*{U,V) 


.055 



Table 2: Test statistics and associated p-values for the hypothesis YALZ\X for the digoxin 
data 
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Figure 3: Randomly generated data with different noise to signal ratios A. Here, ctq = 1 
and so = A/ao = A. 



9 



Data were generated according to the following model. We assumed 



Y = g{x)+eY{x) and Z = h{x)+ez{x) 



where the error pairs (ey (x), e^(x)) follow a bivariate normal distribution with zero mean, 
(partial) correlation pyz.x and equal marginal standard deviations. The functions g and 
h were generated according to the integrated Wiener processes 



where Wu and W2t are independent Wiener processes. Note that ^(O) = /i(0) = 0, which 
is arbitrary but does not affect the simulations. Further note that, with probability one, 
g and h are once differentiable. A 'noise to signal ratio' can be defined as A = (j^joQ. 
In Figure 3, randomly generated (centered) curves and data are plotted with (Tq = 1 and 
A G {0.1, .0.3,0.5,0.7}. Finally, the values of x were chosen uniformly on [0, 1]. 

We simulated data according to the above model and estimated Fy^x and -F^|x using 
the Nadaraya- Watson estimators (7 and (8). We took a standard normal kernel and chose 
bandwidths according to the formula h\ = h2 = l.lb^ \/n^ which we empirically found to 
yield a kernel that gives a good approximation to the hat-matrix for the posterior mode 
for the above model. For each simulated data set, the p-value for the null hypothesis of 
conditional independence was obtained by performing a permutation test for independence 
on the tli and V^, where we used the sample correlation coefficient as a test statistic. 

In Figure 4, it can be seen that for n = 20 and n = 100, unless A is very small, Type I 
errors error rates are close to nominal and the loss of power due to conditioning seems 
reasonable, especially for n = 100. For larger sample sizes the loss of power would become 
negligible. The method breaks down when A becomes very small, which is to be expected 
as it leads to strong overfitting. 

Figure 5 shows how the choice of bandwidth affects the Type I error rate. In particular 
for n = 100, a wide range of bandwidths give good error rates. 

4 Remarks 

The partial copula is a tool by means of which one's favorite test of independence of appro- 
priate form can be used for (rank) testing of conditional independence. Thus, if desired, 
consistency against arbitrary alternatives can be achieved by choosing an appropriate test 
statistic, as described in the paper. Furthermore, our method is much simpler than com- 
peting methods. For future research, the following questions might be investigated. Firstly, 
both the partial correlation and the partial copula approaches are based on removing as- 
sociation between response and control variables (see Section 1), but in seemingly quite 
different ways. This raises the question whether there is some more general approach to 
removing association. In particular, the case of multivariate response variables Y and Z 




and 
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Figure 4: Power curves and Type I error rates using a = 0.05 for several values of the partial 
correlation p. The dotted lines give the probabilities Hq is rejected for the corresponding 
test of marginal independence. 
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Figure 5: Type I error rates as a function of bandwidth for data generated using A = 0.5. 
The vertical dotted hnes represent the bandwidths used in Figure 4. 



remains unsolved. Secondly, in its current form the partial copula approach does not work 
for discrete variables. We suspect a solution to this problem can be found by considering 
all possible orderings of tied observations, but this needs to be investigated further. Since 
excellent alternative methods are available when the explanatory variable is categorical 
Agresti (2002), this issue does not seem to be too pressing. 
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